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multiple variables 

MMBble features 
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Multiple features (variables). 


Size (feet 2 ) 

X 


2104 
1416 
1534 
852 
• • • 

he(x) = 
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Price ($1000) 

y 


460 
232 
315 
178 
• • • 

0o “1“ 



n = number of features 

£ W = input (features) of i th training example vector 
= value of feature j in i th training example. 
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Hypothesis: 

Previously with a single feature: 

ho(x) = Oq + 0\x 

while with multiple features: 

h(){x) — 9q + 0\X\ + 02X2 + • • • + 0nXn 

For convenience of notation, define Xq = 1 . 

Multivariate linear regression. 
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Linear Regression with 
multiple variables 

®Ndient descent for 
multiple variables 


Machine Learning 
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• • 


Parameters: # 1 ,0 n 


Cost function: 


1 


m 


J(6 0 , 9 lt . . . , e n ) = Y,( h °( xii) ) - y (i) ) 2 


2 m 


i — 1 


Gradient descent: 
Repeat { 


6 j . 0 j Oi rsQ . J(@0 5 • • • ? Ofl) 


3 w 3 ^ d0 * 


} 
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(simultaneously update for every j 

C^o 



Previously (n=i) : 

Repeat { 

.. m 

00 ■■= do ~ a— y - y (t) ) 

m z — ' 

i = 1 

^ m 

01 := 6 \ — a — (ho(x^) — y^)x^ 

m 

1=1 

(simultaneously update # 0 , #1 ) 

} 
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New algorithm (n > l)^j_ 
Repeat { 






m 


} 


a— y 2 (h#(x (t) ) - y M ) x) 

i—i 

(simultaneously update# 
for j = 0, . . . , n 

) 


(0 


do ■= do - x (t) ) - y (t) )x ( 0 z) 

1=1 

m 

d\ := di - a± ^(hff(x M ) - y M )x[ l) 

1=1 

m 

d 2 := d 2 - a± ^(h ff (x M ) - y (t) )x ( 2 } 

1=1 


to 


Feature Scaling 

Idea: Make sure features are on a similar scale. 

Scale by dividing the actual feature value by the feature maximum value 


E.g. X\ = size (0-2000 feet 2 ) 

x 2 = number of bedrooms (1-5) 


X\ 


x 2 


size(f eet 2 ) 

2000 

number of bedrooms 

5 


Get every feature into approximately a o < X{ < 1 range. 
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Mean normalization 

Replace xi with %i — Mi to make features have approximately zero 
mean (Do not apply to xq = 1 ). 

p a size — 1000 

Xl 2000 


X 2 


^bedrooms — 2 
5 

—0.5 < xi < 0.5, —0.5 < X‘2 < 0.5 


There are more general ways of feature scaling by replacing Xj by Xj-pj/Sj where is 
the average value of Xj and S is the range (=max-min) or the standard deviation 
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Gradient descent 


Oi “ 8j ~ ajj^m 

- “Debugging”: How to make sure gradient descent is 
working correctly. 


How to choose learning rate <y • 
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Making sure gradient descent is working correctly. 


min J(9) 
o 1 



Plot J(6) as a function of the 
number of iterations 

It works correctly when J ( 6 ) 
decreases after every iteration 
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Making sure gradient descent is working correctly. 




Gradient descent not working. Gradient descent may overshoot the minimum in 

Use smaller ct. fa is too large to converge) each iteration. Use smaller ct . 


- For sufficiently small a, J (6) should decrease on every iteration. 

- But if Oi is too small, gradient descent can be slow to converge. 
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Summary: 

- If a is too small: slow convergence. 

- If a is too large: J(6) may not decrease on 
every iteration; may not converge. 


To choose a , try 

..., 0 . 001 , 


, 0 . 01 , 


, 0 . 1 , 
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Linear Regression with 
multiple variables 


Features and 


polynomial regression 



housing prices prediction 

hg(x) = 9 q + 0i x frontage + 62 x depth 


Area = x = frontage*depth 


he(x) = 9 q + 9\x 
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X 


X 


Polynomial regression 


X X X X 


Price 

(y) 


X X 


Size (x) 

Hq{x) = Oo + 6\X\ + 62X2 + O3X3 

= #0 + Oi(size) + $2 (size ) 2 + Os(size ) 3 

x\ = (size) 

X2 = (size) 2 
X3 = (size) 3 
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Oq H - 0\X + 02X^ 

Oq H - ^ 1 ^ + $ 2 ^ 4“ 0%x^ 
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Linear Regression with 
multiple variables 

NHftnal equation 


Machine Learning 
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Normal equation: 

Method to solve for $ analytically. 
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Size (feet 2 ) 


x 0 


X\ 


1 

1 

1 

1 

1 


2104 

1416 

1534 

852 

3000 


X = 


1 

1 

1 

1 

1 


2104 

1416 

1534 

852 

3000 


Number of 
bedrooms 

x 2 

5 

3 

3 
2 

4 

5 1 45 " 

3 2 40 

3 2 30 

2 1 36 

4 1 38 
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Number of 
floors 

Age of home 
(years) 

£4 

Price ($1000) 

y 

1 

45 

460 

2 

40 

232 

2 

30 

315 

1 

36 

178 

1 

38 

540 


y = 


460 

232 

315 

178 

540 


0 = {X T X)~ 1 X T y 
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m training examples, 


Gradient Descent 


• Need to choose a . 

• Needs many 
iterations. 

• Works well even 
when n is large. 
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features. 

Normal Equation 

• No need to chooser* . 

• Don’t need to iterate. 

• Need to compute 

{X T X)~ l 

• Slow ifn is very large. 
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